Binary Embedding with Additive Homogeneous Kernels

نویسندگان

  • Saehoon Kim
  • Seungjin Choi
چکیده

Binary embedding transforms vectors in Euclidean space into the vertices of Hamming space such that Hamming distance between binary codes reflects a particular distance metric. In machine learning, the similarity metrics induced by Mercer kernels are frequently used, leading to the development of binary embedding with Mercer kernels (BE-MK) where the approximate nearest neighbor search is performed in a reproducing kernel Hilbert space (RKHS). Kernelized localitysensitive hashing (KLSH), which is one of the representative BE-MK, uses kernel PCA to embed data points into a Euclidean space, followed by the random hyperplane binary embedding. In general, it works well when the query and data points in the database follow the same probability distribution. The streaming data environment, however, continuously requires KLSH to update the leading eigenvectors of the Gram matrix, which can be costly or hard to carry out in practice. In this paper we present a completely randomized binary embedding to work with a family of additive homogeneous kernels, referred to as BE-AHK. The proposed algorithm is easy to implement, built on Vedaldi and Zisserman’s work on explicit feature maps for additive homogeneous kernels. We show that our BE-AHK is able to preserve kernel values by developing an upperand lower-bound on its Hamming distance, which guarantees to solve approximate nearest neighbor search efficiently. Numerical experiments demonstrate that BE-AHK actually yields similarity-preserving binary codes in terms of additive homogeneous kernels and is superior to existing methods in case that training data and queries are generated from different distributions. Moreover, in cases where a large code size is allowed, the performance of BE-AHK is comparable to that of KLSH in general cases. Introduction Binary embedding (BE) refers to the methods that transform examples in R into the vertices of Hamming space, i.e., {0, 1}, in which the normalized Hamming distance between binary codes preserves a particular distance measure, including angular distance (Charikar 2002) and kernel-induced distance (Kulis and Grauman 2009) (Li, Samorodnitsky, and Hopcroft 2012) (Raginsky and Lazebnik 2009). Most notably, random hyperplane binary embedding (Charikar 2002) involves random projection followed Copyright c © 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. by binary quantization, which aims to preserve angular distance between two vectors. Randomized binary embedding (RBE) seeks to develop an embedding function without requiring any training data points. Contrary to RBE, data-dependent binary embedding (DBE) makes use of a training set to construct compact binary codes (Weiss, Torralba, and Fergus 2008), (Gong and Lazebnik 2011), (Li et al. 2014). We observe that DBE performs poorly in the case that the query and training data points are generated from different distributions. Recently, online DBE (Huang, Yang, and Zhang 2013) (Leng et al. 2015) sequentially learns an embedding function for largescale or streaming data. However, it still incurs overhead to re-compute all binary codes as a new point arrives. Therefore, it is necessary to develop RBE with different types of streaming data. For example, χ and intersection kernels have been frequently used as a distance metric for histograms, which makes it necessary to develop RBE with such kernels. Binary embedding with Mercer kernels (BE-MK) (Kulis and Grauman 2009) (Raginsky and Lazebnik 2009) (Mu et al. 2014) (Jiang, Que, and Kulis 2015) employs feature maps (kernel PCA or Nyström approximation) followed by RBE, such that the normalized Hamming distance between codes preserves Mercer kernels. Since it requires training examples to build the feature map, we observe that it might not be adequate for streaming environment. For example, kernelized locality-sensitive hashing (KLSH) (Jiang, Que, and Kulis 2015), which is one of the representative example of BE-MK, employs KPCA for the feature map, which requires a set of training data points to compute the leading eigenvector of Gram matrix. If the data distribution changes over time, the performance of KLSH is steadily degraded over time. In this paper, we propose a completely randomized binary embedding with additive homogeneous kernels, referred to as RBE-AHK, where data points are embedded onto R by the explicit feature map for additive homogeneous kernels (Vedaldi and Zisserman 2012) and then are transformed into the vertices of Hamming space by the random hyperplane binary embedding. The contribution of this paper is summarized below. • We propose a RBE algorithm for additive homogeneous kernels and conduct the numerical experiments to show that the proposed algorithm is superior to existing BE-MK methods in case that training data and queries are generated from different distributions. • We present the lower and upper bounds on Hamming distance between binary codes generated by the proposed algorithm, which guarantees to solve approximate nearest neighbor search problem and large-scale machine learning efficiently. Background In this section, we briefly review some prerequisites to describe the proposed algorithm. Random Hyperplane Binary Embedding Random hyperplane binary embedding (Charikar 2002), referred to as RHBE, involves a random projection followed by binary quantization, whose an embedding function is formally defined as h(x) , sgn ( w>x ) , where w ∈ R is a random vector sampled on a unit d-sphere and sgn(·) is the sign function which returns 1 whenever the input is nonnegative and -1 otherwise. It was shown in (Charikar 2002) that RHBE naturally gives an unbiased estimator of angular distance such that the expectation of Hamming distance is the angle between two vectors, i.e., E [ I [h(x) 6= h(y)] ] = θx,y π , (1) where I[·] is an indicator function which returns 1 whenever the input argument is true and 0 otherwise, and θx,y denotes the angle between two vectors. It is easy to verify that RHBE is (r, r(1 + ), 1− r π , 1− r(1+ ) π )-sensitive locality-sensitive hashing family (Indyk and Motwani 1998), (Gionis, Indyk, and Motawani 1999), leading to a O(n) time complexity algorithm to solve an approximate nearest neighbor search problem, where ρ = log(1− r π ) log(1− r(1+ ) π ) . Binary Embedding with Mercer Kernels Binary embedding with Mercer kernels (BE-MK) (Kulis and Grauman 2009) (Mu et al. 2014) (Jiang, Que, and Kulis 2015) employs feature maps followed by RBE, which is defined as follows: h(x) = sgn ( w>φ(x) ) , (2) whereφ(·) is a feature map (kernel PCA or Nyström approximation) for a particular Mercer kernel and w is a random vector in R. For example, kernelized locality-sensitive hashing (KLSH) (Kulis and Grauman 2009) (Jiang, Que, and Kulis 2015) for a kernel k(·, ·) involves the following feature mapping: φ(x) , Uk [k(x ∗ 1,x); · · · ; k(xm,x)] , (3) where Uk is the k leading eigenvectors of the Gram matrix. Instead of such data-dependent embedding, a fully randomized binary embedding is developed to preserve Mercer kernels, which includes χ kernel (Li, Samorodnitsky, and Hopcroft 2012) and shift-invariant kernels (Raginsky and Lazebnik 2009). Table 1: Additive homogeneous kernels with closed-form feature maps. kernel k(x, y) Φw(x) Hellinger’s √ xy √ x χ 2 xy x+y e iw log x √ xsech(πw) intersection min{x, y} e log x √ 2x π 1 1+4w2 For existing algorithms, we observe the following limitations: • It requires a training dataset for data-dependent binary embedding to construct feature maps, resulting in the poor performance when the query and training data are generated from very different distributions. • Up to our knowledge, there does not exist a completely randomized algorithm to work with a large family of kernels. For example, additive homogeneous kernels, which includes very common kernels (e.x. χ kernels, intersection kernels, etc.), are not be considered in RBE. Explicit Feature Maps for Additive Homogeneous Kernels Additive homogeneous kernels are said to be a family of Mercer kernels K : R+ × R+ → R, which is defined as follows:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generalized RBF feature maps for Efficient Detection

These kernels combine the benefits of two other important classes of kernels: the homogeneous additive kernels (e.g. the χ2 kernel) and the RBF kernels (e.g. the exponential kernel). However, large scale problems require machine learning techniques of at most linear complexity and these are usually limited to linear kernels. Recently, Maji and Berg [2] and Vedaldi and Zisserman [4] proposed exp...

متن کامل

Fast Image Search with Locality-Sensitive Hashing and Homogeneous Kernels Map

Fast image search with efficient additive kernels and kernel locality-sensitive hashing has been proposed. As to hold the kernel functions, recent work has probed methods to create locality-sensitive hashing, which guarantee our approach's linear time; however existing methods still do not solve the problem of locality-sensitive hashing (LSH) algorithm and indirectly sacrifice the loss in accur...

متن کامل

Better multiclass classification via a margin-optimized single binary problem

We develop a new multiclass classification method that reduces the multiclass problem to a single binary classifier (SBC). Our method constructs the binary problem by embedding smaller binary problems into a single space. A good embedding will allow for large margin classification. We show that the construction of such an embedding can be reduced to the task of learning linear combinations of k...

متن کامل

On the relation between universality, characteristic kernels and RKHS embedding of measures

Universal kernels have been shown to play an important role in the achievability of the Bayes risk by many kernel-based algorithms that include binary classification, regression, etc. In this paper, we propose a notion of universality that generalizes the notions introduced by Steinwart and Micchelli et al. and study the necessary and sufficient conditions for a kernel to be universal. We show ...

متن کامل

Weighted Inequalities on Morrey Spaces for Linear and Multilinear Fractional Integrals with Homogeneous Kernels

In this paper, we consider weighted inequalities for linear and multilinear fractional integrals with homogeneous kernels on Morrey spaces. Recently, weighted inequalities without homogeneous kernels were proved by the authors. In this paper, we generalize ones with homogeneous kernels.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017